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The  Armed  Services  Vocational  Aptitude 
Battery  (ASVAB)  is  used  by  all  of  the  U.S. 
military  services  for  enlistment  qualification 
and  to  classify  enlistees  into  military  occupa¬ 
tions.  Because  some  military  jobs  change  over 
time,  joint-service  collaborations  have  oc¬ 
curred,  researching  how  to  augment  the  breadth 
of  the  domain/constructs  measured  by  the 
ASVAB.  The  current  battery  predominately 
measures  academic  achievement  (math,  verbal, 
science)  and  technical  knowledge  (mechanical. 
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electronics,  auto/shop).  Although  the  ASVAB 
contains  tests  designed  to  measure  the  aptitude 
domain  related  to  training  performance  in  mil¬ 
itary  jobs,  much  of  its  content  is  also  linked  to 
job  knowledge  and  job  performance  constructs. 
Strengthening  the  relationship  between  the  ap¬ 
titude/ability/learning  capabilities  measured  by 
the  ASVAB  with  military  performance  im¬ 
proves  the  ability  to  more  accurately  assign 
individuals  to  occupations  for  which  they  are 
likely  to  succeed,  therefore  lowering  military 
costs  and  the  personal  costs  associated  with 
failure. 

All  of  the  services  have  conducted  personnel 
selection  and  classification  research  over  the 
years,  with  a  major  objective  of  expanding  the 
ASVAB.  The  most  comprehensive  effort  was 
the  Army’s  Project  A,  which  expanded  not  only 
the  predictor  domain  but  also  the  military  per¬ 
formance  domain,  or  criterion  space,  upon 
which  the  predictors  would  be  validated 
(Buscigilo,  Palmer,  King,  &  Walker,  1994; 
Campbell  &  Zook,  1992;  Russell  &  Peterson, 
2001).  Another  large  research  effort  was  a  joint- 
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service  project  that  capitalized  on  the  technol¬ 
ogy  that  launched  the  computer  adaptive  ver¬ 
sion  of  the  ASVAB,  the  CAT-ASVAB.  This 
project  was  known  as  the  Enhanced  Computer- 
Administered  Test  (ECAT)  battery  (Alderton, 
Wolfe,  &  Larson,  1997).  The  ECAT  project  was 
“  .  .  .  driven  by  cognitive  theories  of  aptitude, 
working  memory,  and  mental  imagery”  (Aider- 
ton  et  ah,  1997,  p.  7),  specifically,  Carroll’s 
(1993)  theory  of  cognitive  abilities.  At  the  cul¬ 
mination  of  the  ECAT  project,  only  one  test  was 
chosen  for  addition  to  the  ASVAB:  the  spatial 
ability  test.  Assembling  Objects  (AO).  The  ma¬ 
jor  reasons  at  the  time  for  selecting  the  AO  test 
for  the  ASVAB  were  a  small  but  meaningful 
degree  of  incremental  validity  for  some  of  the 
studied  occupations  and  the  demonstration  of 
reduced  adverse  impact,  but  also  that  the  test 
could  be  administered  in  both  paper-and-pencil 
and  computer  formats,  whereas  other  ECATs 
could  not.  Eor  a  full  discussion  of  the  ECAT 
battery,  see  the  special  1997  Military  Psychol¬ 
ogy  issue  (Volume  9,  Number  1)  dedicated  to 
the  ECAT. 

With  regard  to  ASVAB  and  ECAT  construct 
overlap  and  unique  ECAT  construct  measure¬ 
ment,  the  Navy  conducted  several  factor  analy¬ 
ses  (Alderton  et  al.,  1997)  that  varied  in  extrac¬ 
tion  method,  rotation  method,  number  of  factors 
extracted,  and  initial  communality  estimates. 
The  most  representative  structure  came  from  a 
hierarchical  factor  solution  favoring  Carroll’s 
(1993)  structure  of  a  general  ability  factor  and 
orthogonal  (unrelated)  specific  abilities.  Eactor 
analyzed  as  separate  batteries,  the  ASVAB 
showed  an  overarching  general  ability  factor 
with  four  clear  lower-level  factors  of  Technical 
Knowledge,  Verbal  Ability,  Clerical  Speed 
(which  contained  the  Numerical  Operations 
[NO]  and  Coding  Speed  [CS]  tests),  and  Math¬ 
ematics  Ability.  In  contrast,  the  ECAT,  which 
also  had  an  overarching  general  ability  factor, 
showed  different  lower-level  factors  than  were 
observed  in  the  ASVAB:  Spatial  (which  con¬ 
tained  AO,  among  other  tests  measuring  spatial 
ability).  Psychomotor,  and  Working  Memory. 

More  recently,  an  external  ASVAB  review 
panel  with  expertise  in  personnel  selection,  job 
classification,  psychometrics,  and  cognitive 
psychology  met  to  consider  the  current 
ASVAB’s  content  and  the  testing  research  con¬ 
ducted  by  the  military  personnel  research  labo¬ 
ratories  (Drasgow,  Embretson,  Kyllonen,  & 


Schmitt,  2006).  As  part  of  the  panel’s  evalua¬ 
tion,  Drasgow  and  colleagues  also  examined 
ASVAB  content  in  light  of  Carroll’s  (1993) 
stratum  theory  of  the  structure  of  intellect  and 
conducted  confirmatory  factor  analyses  of  the 
ASVAB  tests  based  on  the  Spearman-Holzinger 
bifactor  model  (Holzinger  &  Harman,  1941). 
Drasgow  and  colleagues  found,  as  have  others 
(e.g.,  Ree  &  Carretta,  1994),  a  strong  general 
factor  for  the  ASVAB  dominated  by  the  verbal 
and  math  tests,  which  they  interpreted  as  crys¬ 
tallized  intelligence  (Gc).  Crystallized  intelli¬ 
gence  loads  strongly  on  language  skills  (e.g., 
vocabulary)  and  education  (general  and  specific 
knowledge)  and  therefore  reflects  intellectual 
achievement  that,  in  turn,  depends  somewhat  on 
access  to  quality  education  or  specialized 
knowledge.  Gc  may  also  be  linked  to  socioeco¬ 
nomic  status,  interests,  and/or  opportunity.  In 
contrast,  the  ECAT  is  considered  by  psycholo¬ 
gists  to  measure  fluid  intelligence  (Gf).  Gf  can 
be  described  as  the  ability  to  think  logically  and 
solve  problems  in  novel  situations  independent 
of  knowledge  acquired  through  education, 
learning,  or  experience.  Given  the  increasingly 
diverse  youth  population  and  the  emphasis  of 
several  emerging  military  occupations  on  the 
ability  to  think  logically  and  solve  problems 
(e.g.,  cyber  occupations),  it  seems  appropriate 
for  the  ASVAB  to  contain  more  measures  of 
fluid  intelligence  than  just  AO. 

CS  is  a  former  ASVAB  test  that  is  currently 
administered  as  a  special  classification  test  to 
Navy  applicants.  Although  CS  is  under  the  um¬ 
brella  of  Gf,  it  may  be  viewed  more  as  a  pro- 
cess-based  or  processing-perceptual  speed  test, 
where  performance  depends  on  the  speed  and 
accuracy  with  which  individuals  perform  sim¬ 
ple  information  processing  tasks  (Ackerman  & 
Cianciolo,  2000).  By  process-based,  we  mean 
that  the  test  content  is  uncomplicated  and  inci¬ 
dental  to  the  ability  being  measured.  Process- 
based  measures  like  CS  that  do  not  rely  on 
learned  content  have  contributed  to  military 
personnel  selection  batteries  since  World  War  I. 
Dockeray  and  Isaacs  (1921)  reported  that  both 
Italy  and  Erance  included  measures  of  reaction 
time  (RT)  in  their  pilot  selection  batteries,  a 
slightly  different  construct  than  CS,  but  never¬ 
theless  measuring  speed.  Thurstone’s  work  on 
the  identification  of  primary  mental  abilities 
(Thurstone,  1938;  Thurstone  &  Thurstone, 
1941)  provided  further  support  for  the  impor- 
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tance  of  process-based  measures  and  Gf  in  their 
identification  of  perceptual  speed,  memory,  and 
space  factors. 

In  support  of  augmenting  the  ASVAB  con¬ 
tent,  many  theorists  have  proposed  that  includ¬ 
ing  and  differentially  weighting  tests  that  mea¬ 
sure  specihc  abilities  that  are  important  for 
occupational  areas  should  result  in  better  pre¬ 
diction  of  occupational  performance  than 
merely  depending  on  measures  of  general  cog¬ 
nitive  ability.  This  hypothesis  is  referred  to  as 
specihc  aptitude  theory  or  differential  aptitude 
theory  (Hull,  1928;  Thurstone,  1938).  The  in- 
huence  of  differential  aptitude  theory  is  re- 
hected  in  the  development  of  taxonomies  of 
human  abilities  (e.g.,  Fleishman,  Quaintance,  & 
Broadling,  1994)  and  military  multiple-aptitude 
test  batteries  that  are  somewhat  an  outgrowth  of 
these  taxonomies.  For  example,  tests  of  spatial 
ability  and  processing  speed  that  do  not  rely  on 
learned  content  have  been  a  mainstay  of  multi¬ 
ple  aptitude  aircrew  test  batteries  (Carretta  & 
Ree,  2003)  such  as  the  Air  Force  Officer  Qual¬ 
ifying  Test  (Drasgow,  Nye,  Carretta,  &  Ree, 
2010)  and  other  aircrew  aptitude  batteries  (Car¬ 
retta  &  Ree,  2003)  for  many  years,  as  well  as 
periodically  appearing  on  the  ASVAB  and  ser- 
vice-specihc  batteries  (Rumsey,  2012).  Further, 
with  regard  to  speeded  tests,  Alf  and  Gordon 
(1957)  demonstrated  the  broader  application  of 


so-called  simple  clerical  tests  for  military  occu¬ 
pations  when  they  found  a  Navy  clerical  com¬ 
posite  had  higher  validity  for  predicting  Navy 
frogmen  (early  designation  for  Navy  SEALs 
[Sea,  Air,  and  Land])  Heet  performance  (r  = 
.40)  than  did  knowledge-based  tests. 

The  influence  of  differential  aptitude  theory 
is  still  pervasive  and  has  been  adopted  by  the 
Army  in  the  development  of  differential  assign¬ 
ment  theory  (DAT;  Johnson  &  Zeidner,  1995; 
Zeidner  &  Johnson,  1991,  1994).  DAT  is  a 
multifaceted  theoretical  framework  firmly 
grounded  in  classification  principles  that  con¬ 
siders  both  predictive  validity  (stressed  in  gen¬ 
eral  mental  ability  theories)  and  differential  va¬ 
lidity  (specific  ability  measures  contributing 
incremental  validity  over  general  mental  abil¬ 
ity).  The  application  of  DAT  is  intended  to 
improve  the  process  of  optimally  matching  peo¬ 
ple  to  jobs  and  has  been  incorporated  into  the 
Army’s  enlisted  personnel  classification  algo¬ 
rithm  (Johnson  &  Zeidner,  1995;  McWhite  & 
Greenston,  1998).  DAT  is  discussed  later  in  the 
paper  in  the  context  of  classification  efficiency. 
Table  1  provides  brief  descriptions  of  the 
ASVAB  that  includes  AO,  but  also  the  CS 
special  classification  test  currently  delivered 
only  on  the  CAT-ASVAB  platform  to  Navy 
applicants. 


Table  1 

Description  of  the  ASVAB  and  Coding  Speed  Tests 


Test  name  and  abbreviation 


Test  description 


General  Science  (GS) 

Arithmetic  Reasoning  (AR) 

Word  Knowledge  (WK)^ 

Paragraph  Comprehension  (PC)^ 
Mathematics  Knowledge  (MK) 
Electronics  Information  (El) 

Auto  and  Shop  Information  (AS) 
Mechanical  Comprehension  (MC) 
Assembling  Objects  (AO)^ 

Coding  Speed  (CS)*’ 


Knowledge  of  physical  and  biological  sciences 
Ability  to  solve  arithmetic  word  problems 

Ability  to  select  the  correct  meaning  of  words  presented  in  context 
and  correct  synonyms 

Ability  to  obtain  information  from  written  passages 
Knowledge  of  high  school  mathematics  principles 
Knowledge  of  electricity  and  electronics 

Knowledge  of  automobile  and  shop  technologies  tools  and  practices 
Knowledge  of  mechanical  and  physical  principles 
Ability  to  determine  correct  spatial  forms  from  their  separate  parts 
and  connection  points 

Ability  to  quickly  identify  correct  word/number  pairings  from  a  key 
with  many  options 


Note.  ASVAB  =  Armed  Services  Vocational  Aptitude  Battery. 

^  WK  and  PC  are  combined  to  form  the  Verbal  (VE)  composite  that  is  a  component  of  the  AFQT  and  several  Navy  ASVAB 
classification  composites.  *’  Not  all  recruits  enter  the  Navy  with  AO  and  CS  test  scores.  CS  is  only  given  by  computer  at 
the  MEPS  at  the  end  of  the  computer-administered  CAT-ASVAB.  AO,  also  given  on  the  CAT-ASVAB,  is  not  given  to  high 
school  students  taking  the  paper  and  pencil  version  of  the  ASVAB  under  the  Career  Exploration  Program,  but  is  given  in 
paper-and-pencil  ASVAB  forms  in  the  Enlisted  Testing  Program. 
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As  explained  earlier,  the  ASVAB  has  under¬ 
gone  several  content  changes  since  its  imple¬ 
mentation  in  the  1970s,  and  there  is  full  support 
by  the  Department  of  Defense  (DoD)/services 
for  further  change.  Drasgow  et  al.  (2006)  had 
many  recommendations  regarding  content  of 
the  battery.  They  recommended  a  review  of  the 
ASVAB  content,  a  revisit  of  the  ECAT  results, 
and  consideration  of  new  content  to  include 
measures  of  noncognitive  characteristics,  a 
technical  knowledge  test  of  information/ 
communications  technology  literacy,  and  en¬ 
hanced  measurement  of  nonverbal  reasoning. 
The  rationale  for  the  inclusion  of  nonverbal 
reasoning  tests  includes  expanding  the  breadth 
of  the  measurement  of  general  mental  ability, 
improving  classification  effectiveness,  reducing 
adverse  impact,  and  improving  the  assessment 
of  cognitive  ability  and  trainability  in  applicants 
challenged  in  English  skills  (e.g.,  non-native 
English  speakers;  Drasgow  et  al.,  2006).  The 
AO  spatial  ability  test,  depending  on  the  type  of 
factor  analysis,  can  be  considered  somewhat  of 
a  nonverbal  reasoning  test  and  is  the  only  such 
test  included  in  the  current  ASVAB.  The  DoD 
is  now  preparing  to  evaluate  several  nonverbal 
reasoning  tests  including  a  working  memory 
test.  The  CS  test,  a  former  ASVAB  test,  while 
not  considered  a  nonverbal  reasoning  test,  has 
its  own  merits  and  has  been  revamped  to  ad¬ 
dress  some  issues  discussed  in  this  paper. 

Purpose 

The  purpose  of  this  paper  is  to  provide  the 
history  of  the  CS  and  AO  tests  and  the  support¬ 
ing  theoretical  and  empirical  evidence  for  their 
use  in  military  occupational  classification.  Stud¬ 
ies  are  reviewed  that  focus  on  (a)  incremental 
validity  when  used  in  combination  with  the 
academic/technical  knowledge-based  ASVAB 
tests  for  predicting  military  training  perfor¬ 
mance  criteria,  (b)  reducing  subgroup  differ¬ 
ences  (adverse  impact)  for  women  and  racial/ 
ethnic  minority  groups,  and  (c)  improved 
classification  in  terms  of  matching  recruits  to 
occupations.  Although  the  analyses  reported 
here  are  not  exhaustive,  they  provide  evidence 
of  the  utility  of  CS  and  AO  and  insights  regard¬ 
ing  the  likely  benefits  of  measures  of  reduced 
verbal  content  or  process-based  tests  in  supple¬ 
menting  the  ASVAB  verbal,  math,  and  techni¬ 
cal  knowledge  tests. 


History  and  Use  of  CS  and  AO  in  U.S. 

Military  Personnel  Selection 
and  Classification 

The  following  sections  describe  the  history  of 
the  Coding  Speed  (CS)  and  Assembling  Objects 
(AO)  tests  and  the  theoretical  and  empirical 
evidence  supporting  use  in  military  occupa¬ 
tional  classification. 

Coding  Speed 

The  Army  developed  the  earliest-known  U.S. 
military  test  of  “coding  speed”  used  operation¬ 
ally  for  military  occupation  classification 
(Helme,  Graham,  &  Anderson,  1962).  Helme 
and  colleagues  described  the  Army  Clerical 
Speed  Test,  which  closely  resembles  the  former 
ASVAB  CS  test.  The  Navy  subsequently  mod¬ 
ified  the  Clerical  Speed  Test  and  adopted  it  as 
part  of  their  Basic  Test  Battery.  In  1976,  the  first 
joint-service  ASVAB  forms  (Eorms  6  and  7) 
were  introduced  for  enlistment  qualification  and 
classification  and  they  contained  a  different 
clerical  speed  test.  Attention  to  Detail  (AD). 
The  AD  test  was  subsequently  considered  sub- 
optimal  in  predictive  validity  and  classification 
utility,  so  in  October  1980,  AD  was  replaced  by 
CS  in  ASVAB  Forms  8,  9,  and  10. 

From  1980  to  2002,  the  ASVAB  contained 
two  speeded  tests,  NO  and  CS,  with  CS  used 
most  widely  in  classifying  military  recruits  to 
clerical  occupations  (Weltin  &  Popelka,  1983), 
but  with  the  Army  and  Navy  using  the  tests  for 
a  variety  of  occupations.  Both  tests  were  elim¬ 
inated  from  the  battery  in  2002  because  of  prob¬ 
lems  associated  with  speeded  tests;  mainly,  ex¬ 
aminees’  scores  were  sensitive  to  changes  in 
test  format  and  item  response  input  modes.  For 
example,  in  their  paper-and-pencil  format,  NO 
and  CS  scores  were  impacted  when  the  answer 
sheet  with  round  bubbles  for  marking  responses 
was  replaced  with  one  that  had  narrow  verti¬ 
cally  placed  rectangles  (Bloxom,  Thomasson, 
Wise,  &  Branch,  1993;  Ree  &  Wegner,  1990). 
The  answer  sheet  with  rectangles  took  less  time 
to  input  responses  than  the  answer  sheet  with 
circles  because  only  one  up-and-down  stroke  of 
the  pencil  was  required  to  fill  in  the  rectangles 
compared  to  the  longer  time  it  took  to  more 
carefully  exercise  a  circular  motion  to  fill  in  the 
circles.  Given  that  the  NO  and  CS  tests  were 
scored  as  number  of  items  correct  under  a  time 
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limit,  examinees  with  the  rectangle  answer 
sheets  had  on  average  more  correct  responses 
than  examinees  with  the  bubble  answer  sheets. 

NO  and  CS  score  impact  issues  again  became 
a  concern  when  the  ASVAB  became  computer 
adaptive  (CAT-ASVAB).  To  study  potential 
score  impacts  on  NO  and  CS,  a  CAT-ASVAB 
computer  hardware  effects  study  was  designed 
with  conditions  that  varied  computer  features 
(CPU,  monitor  size,  response  input  devise, 
color  scheme,  and  portability;  Segall,  1997). 
The  study  showed  that  NO  was  sensitive  to  both 
response  input  device  (e.g.,  keypad  vs.  the  ex¬ 
isting  template  covered  keyboard  specially  de¬ 
signed  for  CAT-ASVAB)  and  computer  porta¬ 
bility  (e.g.,  subnotebook  vs.  desktop  PC).  The 
CS  test  was  only  sensitive  to  portability  (with 
acknowledgment  that  the  exact  features  that 
caused  the  score  differences  would  be  hard  to 
determine),  but  it  appeared  that  only  the  speed 
component  was  affected,  not  the  accuracy  com¬ 
ponent  (Segall,  1997,  p.  226).  It  should  be  noted 
that  no  statistically  signihcant  answer  sheet  ef¬ 
fects  or  computer  hardware  effects  were  ob¬ 
served  for  the  ASVAB  power  tests  (Bloxom  et 
ah,  1993)  and  the  ASVAB  tests  have  since  been 
considered  robust  to  platform  changes,  includ¬ 
ing  Internet  delivery. 

The  Defense  Manpower  Data  Center 
(DMDC)  and  the  Navy  paid  particular  attention 
to  CS  during  the  speeded  tests’  evaluation  be¬ 
cause  of  some  of  the  documented  benehts  for 
enhancing  the  military  classihcation  systems 
discussed  in  this  paper.  One  of  DMDC’s  sup¬ 
porting  efforts  for  both  NO  and  CS  was  the 
development  of  a  more  robust  “rate  score”  to 
replace  the  simple  number  of  items  answered 
correctly  score  (Segall,  Moreno,  Bloxom,  & 
Hetter,  1997).  The  rate  score  is  essentially  the 
average  per  minute  number  of  items  correct 
corrected  for  guessing  (e.g.,  fast  random  re¬ 
sponding),  factoring  in  an  adequate  screen  dis¬ 
play  time.  As  explained  by  Segall  et  al.  (1997, 
pp.  137-138),  the  rate  score  is  more  suitable  for 
speeded  tests  where  changes  in  aspects  of  the 
test  and  delivery  platforms  require  consider¬ 
ation  of  the  test  time  limit.  A  recent  concern  for 
CS  was  the  replacement  of  the  specially  conhg- 
ured  CAT-ASVAB  keyboard  with  a  mouse  for 
response  input.  Mouse  input  was  expected  to 
produce  faster  responses  because  examinees 
would  not  need  to  look  down  at  the  keyboard 
for  the  correct  key  (A,  B,  C,  D,  or  E)  to  press; 


that  is,  the  item  choices  for  mouse  response 
choices  are  displayed  low  on  the  computer 
screen  to  merely  be  clicked.  At  the  time  of  this 
writing,  DMDC  had  completed  their  study  of 
CS  score  differences  between  response  input 
modes  and  found  no  differences  and  thus  no 
need  for  a  special  CS  score  equating  (DMDC 
briehng  given  to  Navy  on  October  30,  2013). 

Aside  from  the  CS  rate  score  change,  other 
improvements  to  the  test  have  occurred  over 
time  to  make  CS  a  more  robust  test.  One  of  the 
improvements,  made  by  DMDC,  involved  sim¬ 
plifying  the  test’s  instructions,  because  there 
was  evidence  that  some  of  the  CS  score  vari¬ 
ance  was  due  to  individual  differences  in  the 
ability  to  understand  them.  Also  during  this  CS 
review  time,  contract  support  focused  on  the 
computerized  version  of  CS  that  included  for¬ 
matting  changes  (to  more  closely  resemble  the 
paper-and-pencil  version)  and  increased  oppor¬ 
tunity  to  review  the  revised  instructions  and 
engage  in  the  practice  items  (Abrahams  et  al., 
1996).  In  addition,  Abrahams  and  Alf  (2001) 
compared  several  well-known  perceptual  speed 
tests  that  supported  the  construct  validity  of  the 
CS  test  and  found  further  support  from  the  early 
work  of  Ghiselli  (1966),  which  showed  mea¬ 
sures  of  perceptual  speed  were  useful  for  pre¬ 
dicting  both  training  and  job  performance. 

With  all  of  the  attention  given  to  CS  and  the 
Navy’s  empirical  evidence  supporting  the  test, 
the  Navy  was  able  to  retain  CS  as  a  special 
classihcation  test  administered  seamlessly  at  the 
end  of  the  CAT-ASVAB.  In  2004,  DMDC 
scaled  the  four  computerized  CS  forms  to  the 
newest  ASVAB  normative  population  score 
scale  (Segall,  2004).  From  that  time  to  the  pres¬ 
ent,  CS  has  not  shown  indications  of  compro¬ 
mise  or  score  drift  even  though  the  original 
paper-and-pencil  items  (four  forms)  were  re¬ 
tained  for  the  computerized  version.  CS  is  now 
administered  to  all  Navy  applicants  testing  on 
the  CAT-ASVAB  at  all  of  the  Military  Entrance 
Processing  Stations  (MEPS),  where  the  com¬ 
puter  hardware  features  are  not  widely  dispa¬ 
rate.  CS,  however,  is  not  administered  at  any  of 
the  Military  Entrance  Test  (MET)  sites.  MET 
sites  are  generally  more  remotely  located  than 
MEPS,  are  lower  volume,  and  do  not  administer 
special  tests.  In  the  last  few  years,  the  addition 
of  computers  with  Internet  connectivity  has 
converted  about  50%  of  the  MET  sites  to  Web- 
based  administration  of  the  CAT-ASVAB.  The 
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CS  test  could  be  administered  at  these  now- 
Web-based  MET  sites;  however,  DMDC  would 
have  to  conduct  another  CS  study  to  determine 
if  Internet  delivery  of  the  items  (that  sometimes 
lags)  impacts  CS  scores  and  if  so,  how  to  con¬ 
trol  for  the  effect.  At  this  point,  because  not  all 
Navy  applicants  are  administered  the  CS  or  AO 
tests,  all  ratings  that  have  as  their  operational 
classihcation  composite  one  that  includes  either 
test  must  also  have  an  alternative  ASVAB  com¬ 
posite  that  does  not  contain  these  tests. 

Assembling  Objects 

The  Army  also  developed  the  AO  test,  but  at 
a  different  time  than  CS.  AO  was  developed 
during  the  Army’s  Project  A  (Buscigilo  et  ah, 
1994;  Campbell  &  Zook,  1992;  Russell  &  Pe¬ 
terson,  2001).  One  of  the  hrst  steps  in  Project  A 
was  the  identihcation  of  abilities  and  character¬ 
istics  important  to  Army  occupations  that  were 
not  measured  by  the  ASVAB.  Spatial  ability 
was  identihed  as  a  key  area.  Several  spatial 
constructs  were  identihed;  10  spatial  tests  were 
developed,  six  of  which  survived  held  testing 
and  were  included  in  validation  studies  (Russell 
et  ah,  2001). 

Factor  analyses  of  the  Project  A  spatial  tests 
indicated  the  presence  of  a  general  spatial  factor 
and  that  reasoning  and  assembly  type  items 
were  the  best  measures  of  this  factor.  Additional 
analyses  revealed  that  there  were  small  or  no 
gender  differences  for  the  spatial  tests,  AO  and 
Figural  Reasoning  (FR;  Peterson  et  ah,  1990). 
Further,  in  a  study  of  the  effects  of  practice  and 
coaching  on  test  performance,  only  small-to- 
moderate  mean  score  improvements  were  ob¬ 
served  for  AO  and  FR  (Buscigilo  &  Palmer, 
1996).  Both  AO  and  FR  were  included  in  the 
DoD’s  ECAT  (Alderton  et  ah,  1997)  project. 
Analyses  of  the  ECAT  data  showed  that  AO 
could  increment  the  validity  of  the  ASVAB  for 
predicting  job  performance  and  improve  classi¬ 
hcation  of  personnel  into  some  military  occu¬ 
pations  (Sager,  Peterson,  Oppler,  Rosse,  & 
Walker,  1997;  Wolfe,  Alderton,  Larson,  Bl- 
oxom,  &  Wise,  1997).  The  AO  test  subse¬ 
quently  became  an  ASVAB  test  in  2002  when 
NO  and  CS  were  eliminated. 

It  should  be  noted  that  on  a  theoretical  basis, 
the  Navy,  in  their  use  of  AO  and  CS,  has  not 
combined  the  two  tests  in  the  same  classihca¬ 
tion  composite.  One  reason  is  that  the  tests  are 


considered  measurements  of  separate  constructs 
linked  to  different  occupations.  AO  measures 
the  ability  to  visually  construct  spatial  forms 
from  the  forms’  parts  and  also  to  identify  con¬ 
nection  points  of  form  parts.  On  the  face  of  it, 
these  types  of  test  items  map  well  to  tasks 
performed  in  mechanical  occupations  (Held, 
Fedak,  &  Johns,  2004).  CS,  on  the  other  hand, 
requires  quick  and  accurate  thinking,  which  ap¬ 
plies  to  many  operations  types  of  occupations  in 
addition  to  clerical  (e.g..  Navy  SEALs).  The  AO 
and  CS  tests,  in  more  comprehensive  analyses, 
will  be  evaluated  in  combination  in  the  future 
across  a  wide  variety  of  military  occupations. 

The  second  reason  for  not  combining  AO  and 
CS  in  the  same  classihcation  composite  is  lo¬ 
gistical  in  that  not  all  Navy  applicants  are  ad¬ 
ministered  both  tests.  For  example.  Navy  appli¬ 
cants  testing  on  the  paper-and-pencil  version  of 
the  ASVAB  receive  AO  but  do  not  receive  CS. 
Further,  those  who  take  the  ASVAB  in  the  high 
school  testing  program  (Career  Exploration 
Program,  currently  administering  ASVAB  in 
paper-and-pencil)  do  not  receive  either  AO  or 
CS. 

The  third  reason  for  not  combining  AO  and 
CS  in  the  same  composite  is  that  initial  validity 
analyses  with  data  available  for  both  tests  were 
not  supportive.  For  example,  in  the  ECAT 
study.  Held  and  Wolfe  (1997)  added  the  “best” 
ASVAB  test  to  operational  ASVAB  classihca¬ 
tion  composites  (two  to  four  tests  in  the  com¬ 
posites)  and  compared  the  incremental  validity 
with  that  provided  by  the  best  ECAT  test.  The 
ECAT  incremental  validity  results  showed  AO 
did  not  add  to  the  ASVAB  operational  compos¬ 
ite  for  the  six  occupations  that  used  CS  in  their 
ASVAB  composite  (Held  &  Wolfe,  1997,  p. 
81). 

The  Army  has  conducted  extensive  analyses 
on  the  AO  test  and  found  that  it  has  potential  to 
be  included  in  their  classihcation  systems.  Fur¬ 
ther,  it  has  been  suggested  that  AO  could  be 
used  in  a  revised  version  of  the  ASVAB  Armed 
Forces  Qualihcation  Test  (AFQT*),  which  is 


'  AEQT  scores  are  calculated  from  a  linear  combination 
of  the  ASVAB  verbal  (PC  and  WK)  and  math  (AR  and  MK) 
standardized  scores  and  are  reported  as  percentiles. 
AFQT^  =  S'ar  +  where  VE  (Verbal)  is  a 

weighted  composite  of  the  PC  and  WK  tests  (Segall,  1997). 
In  addition  to  the  AFQT,  the  services  screen  military  appli¬ 
cants  on  education,  mental,  moral  and  physical  factors. 
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used  for  military  enlistment  qualification  (An¬ 
derson  et  al.,  2011).  We  expect  that  the  addition 
of  AO  to  the  AFQT  is  not  likely  but  that  the 
Army  and  the  other  services  will  find  the  AO 
test  useful  in  occupational  classification. 

Criteria  for  Evaluating  the  Utility  of 
AO  and  CS 

There  are  well-established  professional 
guidelines  regarding  the  development  and  use 
of  tests  for  personnel  measurement  and  selec¬ 
tion  (American  Educational  Research  Associa¬ 
tion,  American  Psychological  Association,  & 
National  Council  on  Measurement  in  Educa¬ 
tion,  1999;  Society  for  Industrial  and  Organiza¬ 
tional  Psychology,  2003).  As  stated  earlier,  in 
this  paper,  we  focus  on  three  test  evaluation 
factors  of  particular  importance  to  the  U.S.  mil¬ 
itary;  (a)  incrementing  validity  when  used  in 
combination  with  the  ASVAB  for  predicting 
important  performance  criteria,  (b)  reducing 
subgroup  differences  (adverse  impact)  for 
women  and  at  least  some  racial/ethnic  minority 
groups,  and  (c)  improved  classification  in  terms 
of  matching  recruits  to  occupations. 

When  proposing  new  content  be  added  to  the 
ASVAB,  Drasgow  et  al.  (2006,  p.  25)  empha¬ 
sized  the  potential  benefits  of  reducing  adverse 
impact/expanding  the  applicant  pool  and  im¬ 
proving  classification  efficiency  (CE).  Drasgow 
et  al.  referred  to  CE  in  the  overarching  context 
of  classification  theory  and  the  Army’s  DAT; 
that  is,  tapping  into  relevant  aptitudes/abilities/ 
skills  that  individuals  do  not  have  to  the  same 
degree  and  that  apply  more  strongly  to  different 
occupational  groups.  Although  Drasgow  et  al. 
downplayed  the  importance  of  incremental  va¬ 
lidity  for  new  tests  (e.g.,  measures  could  have 
the  same  predictive  validity  as  currently  ob¬ 
served  for  the  ASVAB  but  may  benefit  the 
military  in  other  ways),  we  provide  evidence 
that  incremental  validity  is  achievable.  In  prac¬ 
tice,  however,  we  recognize  that  new  test  con¬ 
tent  may  produce  mixed  results.  Eor  example, 
the  addition  of  a  psychomotor  test  to  the 
ASVAB  might  provide  incremental  validity  but 
increase  adverse  impact  for  women  and  also 
increase  administration  costs  due  to  the  require¬ 
ment  for  specialized  input  devices,  which  was 
found  to  be  the  case  in  the  evaluation  of  the 
ECAT  psychomotor  tests. 


The  decision  to  supplement  the  ASVAB  with 
new  content  must  include  weighing  positive  and 
negative  impacts  on  several  factors,  not  just  one 
or  two.  Eor  example,  as  pertains  to  the  focus  of 
this  paper,  a  new  measure  should  demonstrate 
predictive  validity  for  more  than  one  occupation 
or  the  measure  is  not  cost-effective,  at  least  for 
broad  applicant  administration.  Also,  it  would 
be  desirable  for  the  test  to  predict  more  than  one 
performance  measure,  not  just  training  grades 
(e.g.,  work  samples,  supervisor  and  peer  ratings, 
and  attrition/retention).  The  Navy,  however, 
stresses  prediction  of  performance  in  training  as 
the  most  relevant  criterion  because  so  many  of 
their  ratings  are  technically  complex  and  failure 
costs  at  this  point  are  high.  The  Army  has  taken 
a  more  comprehensive  approach  and  has  led  the 
way  in  the  measurement  of  posttraining  perfor¬ 
mance  measurement,  predominantly  in  Project 
A,  but  also  in  their  more  recent  evaluation  of 
noncognitive  measures  that  map  better  to  job 
performance  than  to  training  performance. 

Incremental  Validity  to  the  ASVAB 

Coding  Speed.  Table  2  summarizes  valid¬ 
ity  coefficients  for  predicting  final  school 
grades  in  training  for  several  Navy  ratings  for 
ASVAB  composites  with  and  without  CS. 
(Navy  ratings  are  enlisted  occupations  similar 
to  Army  and  Marine  Corps  military  occupa¬ 
tional  specialties  and  Air  Force  specialties.)  The 
validities  in  Table  2  were  corrected  for  range 
restriction  using  the  multivariate  method  (Law- 
ley,  1943)  as  applied  in  the  military  ASVAB 
context  (Held  &  Eoley,  1994)  but  not  criterion 
unreliability.  Table  2  does  not  include  two  Navy 
composites  that  contained  NO  because  the  com¬ 
posites  did  not  show  incremental  validity  to  the 
evaluation  (baseline)  composite,^  VE  -I-  MK. 
During  the  DoD  evaluation  of  the  speeded  tests, 
it  was  suggested  that  the  baseline  composite 
(VE  -f  MK)  was  an  adequate  replacement  for 
service  clerical  composites  that  contained  either 
NO  or  CS. 

The  validity  results  in  Table  2  were  presented 
to  the  Manpower  Accession  Policy  Working 
Group  (MAPWG)  and  the  Defense  Advisory 


^  The  Word  Knowledge  (WK)  and  Paragraph  Compre- 
hension  (PC)  test  standard  scores  ai‘e  combined  to  create  a 
weighted  Verbal  (VE)  composite.  MK  is  the  Math  Knowl¬ 
edge  test. 
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Committee-Military  Personnel  Testing  (DAC- 
MPT)  during  the  1990s.  The  MAPWG  consists 
of  representatives  from  the  services,  the  U.S. 
Military  Enlistment  Command,  the  DMDC,  and 
the  Office  of  the  Secretary  of  Defense,  Acces¬ 
sion  Policy.  The  MAPWG’ s  responsibilities  in¬ 
clude  resolving  issues  related  to  ASVAB  test 
development,  implementation  and  maintenance, 
and  making  policy  recommendations.  The 
DAC-MPT  is  an  independent  advisory  group 
composed  of  volunteer  experts  in  psychomet¬ 
rics,  statistics,  and  test  development.  The  DAC- 
MPT’ s  responsibilities  are  to  review  the  test 
development  methods  and  calibration  of  the 
ASVAB  and  other  military  personnel  selection 
and  classification  tests,  but  also  to  review  va¬ 
lidity  results. 

The  composites  of  concern  for  the  validity 
comparisons  in  Table  2  were  the  Navy’s  oper¬ 
ational  VE  +  MK  +  CS  composite  and  the 
DoD  suggested  replacement,  VE  +  MK.  All 
composite  tests  were  unit- weighted.  The  Navy 
composite  with  CS,  showed,  on  average,  .02 
higher  predictive  validity  than  the  VE  +  MK 
composite.  A  .02  increment  in  predictive  valid¬ 
ity  may  seem  small,  but  in  large-scale  testing 
programs  such  as  the  ASVAB,  can  translate  to 
substantial  benefits  both  in  terms  of  a  reduction 
in  training  attrition  and  in  associated  costs 
(Schmidt,  Dunn,  &  Hunter,  1995).  Eor  example, 
a  .02  increment  in  predictive  validity  for  train¬ 
ing  completion  for  the  personnel  selection  sce¬ 
nario  of  40%  of  ASVAB  youth  qualified  for  the 
occupation  of  air  traffic  controller  would  trans¬ 
late  to  a  1.5%  expected  improvement  in  the 
training  completion  rate  given  certain  parame¬ 
ters  (Taylor  &  Russell,  1939).  These  parameters 
could  be,  for  example,  (a)  a  25%  selection  ratio 
(qualified  youth  resulting  from  the  operational 
selection  instrument  with  cut  score),  (b)  an  op¬ 
erational  selection  composite  criterion-related 
(predictive)  validity  of  .70  (predictive  of  final 
school  grade  in  training  that  determines  success 
and  failure),  (c)  a  .02  validity  improvement  for 
the  candidate  replacement  composite,  and  (d)  an 
observed  83%  training  completion  rate  (Taylor- 
Russell  base  rate  .45  table).  In  this  cost  benefits 
scenario,  at  a  $100,000  training  cost  per  enlistee 
and  1,000  recruited  for  the  occupation,  15  fewer 
recruits  would  be  expected  to  fail  training 
merely  due  to  the  .02  validity  increment  in  the 
selection  composite.  The  expected  cost  savings 
for  air  traffic  controllers  under  these  conditions 
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would  be  $1.5  million  ($100,000/enlistee  X  15 
enlistees).  This  amount  of  savings  is  for  only 
one  of  many  Navy  ratings.  Considerably  higher 
cost-avoidance  savings  would  occur  if  similar 
validity  increments  could  be  realized  for  several 
more  ratings. 

The  Navy  occupations  (see  Table  2)  included 
a  mix  of  clerical  and  nonclerical  (e.g.,  signal¬ 
man,  radioman,  operations  specialist,  and  dental 
technician)  ratings.  These  ratings  are  clearly 
different  from  mechanical  ratings  where  AO  is, 
on  the  face  of  it,  more  relevant  (e.g..  Aviation 
Mechanic).  The  variety  of  occupations  in  Table 
2  is  consistent  with  findings  of  the  relevance  of 
clerical  speed  tests  in  predicting  performance 
for  other  than  clerical  occupations  such  as  frog- 
men/SEAL  (Alf  &  Gordon,  1957;  Held,  2011) 
and  air  traffic  controller  (Held,  2006).  An 
ASVAB  classification  composite  that  includes 
CS,  VE  +  MK  +  MC  +  CS,  is  currently  used 
by  the  SEALs  and  has  been  confirmed  twice  in 
Navy  ASVAB  validation/standards  studies  as 
the  best  predictor  of  success  in  the  mentally 
challenging  SEAL  training  (Held,  2011).  In  ad¬ 
dition,  an  independent  source  outside  the  Navy 
confirmed  the  predictive  validity  of  the  VE  -I- 
MK  -f  MC  +  CS  composite  as  optimal  for  the 
Navy  SEALs  for  an  entirely  different  dataset.^ 

The  Navy’s  VE  4-  MK  -I-  MC  +  CS  com¬ 
posite  used  for  SEAL  classification  is  also  used 
for  the  Navy’s  air  traffic  controller  (AC)  rating. 
Two  predictive  validation  studies  were  con¬ 
ducted  for  the  Navy  air  traffic  controller  rating 
that  produced  similar  results  despite  the  large 
difference  in  sample  sizes  (Aj  =  269,  N  2  =  71; 
Held,  2006).  In  both  air  traffic  controller  stud¬ 
ies,  the  VE  -f  MK  -f  MC  +  CS  composite  had 
the  largest  validity  coefficient  for  predicting 
final  school  grades  (with  a  tower  operations 
hands-on  performance  measure  component)  and 
with  about  the  same  validity  magnitude  (in  the 
.70  to  .80  range).  Also,  in  both  cases,  the  CS 
composite  showed  about  a  .02  increment  in 
validity  over  the  highest  validity  ASVAB  com¬ 
posites  that  did  not  contain  CS.  We  note  that  not 
all  ASVAB  composites  demonstrate  validity 
magnitudes  in  the  .70  to  .80  range  across  mili¬ 
tary  occupations  (.25  to  .85  for  Navy)  and  that 
the  benefits  of  a  .02  validity  increment  depend 
on  many  factors  including  the  baseline  validity 
of  the  operational  composite,  the  number  of 
enlisted  personnel  required  for  the  relevant  oc¬ 


cupations,  the  stringency  of  the  cut  score,  and 
the  observed  failure  rate. 

The  CS  test  also  has  demonstrated  incremen¬ 
tal  validity  for  Army  occupations.  A  study  of 
many  cognitive  and  noncognitive  measures 
from  the  Army’s  Project  A  showed  that  the 
inclusion  of  CS  among  the  predictors  increased 
mean  predicted  performance  across  a  broad  set 
of  occupations  (Scholarios,  Johnson,  &  Zeidner, 
1994). 

It  should  be  noted  that  the  predictive  validity 
of  CS  may  be  moderated  by  job  complexity. 
Schmidt  et  al.  (1995)  observed  that  perceptual 
measures  such  as  CS  do  not  provide  predictive 
validity  over  a  general  ability  factor  of  the 
ASVAB  in  low-complexity  occupations  but  do 
for  higher-complexity  occupations.  This  finding 
has  relevance  for  improving  the  military’s  clas¬ 
sification  systems  by  limiting  the  use  of  the  CS 
test  for  assignment  to  only  moderate  to  high- 
complexity  occupations  where  the  validity  war¬ 
rants.  The  question  becomes  how  to  use  mea¬ 
sures  like  CS  in  occupational  classification 
when  (a)  the  job  is  complex  due  to  a  require¬ 
ment  for  technical  knowledge  in  areas  that  are 
frequently  updated  and  good  reading  compre¬ 
hension  skills  in  order  to  quickly  understand 
technical  manuals  and  (b)  when  several  occu¬ 
pations  are  competing  for  recruits  with  high 
ASVAB  scores. 

It  also  should  be  noted  that  performance  on 
the  CS  test  under  low-stakes  conditions  may  be 
a  function  of  motivation  as  well  as  ability.  The 
AEQT  is  obviously  a  high-stakes  military  selec¬ 
tion  hurdle,  as  it  determines  enlistment  eligibil¬ 
ity,  whereas  the  ASVAB  classification  compos¬ 
ites  are  likely  perceived  as  less  high-stakes,  as 
they  do  not  affect  enlistment  qualification,  only 
job  assignment.  Segal  (2012)  examined 
ASVAB  and  CS  (then  an  ASVAB  test)  data 
from  a  nationally  representative  sample  of 
12,000  participants  in  the  1979  Longitudinal 
Survey  of  Youth  study  (for  information  on  the 
NLSY  go  to  http;//www. bls.gov/nls/nlsy79. 
htm),  where  no  high-stakes  decisions  were  to  be 
made.  Participants  were  surveyed  annually  after 
testing  regarding  their  earnings  until  1994  and 


^  “Follow  on  Research  Findings”  submitted  by  Gallup 
Consulting,  Inc.,  in  201 1  to  Director,  Naval  Special  Warfare 
Recruiting  Directorate,  NAVSPECWARCEN,  San  Diego, 
CA. 


This  document  is  copyrighted  by  the  American  Psychological  Association  or  one  of  its  allied  publishers. 
This  article  is  intended  solely  for  the  personal  use  of  the  individual  user  and  is  not  to  be  disseminated  broadly. 


208 


HELD,  CARRETTA,  AND  RUMSEY 


biannually  afterward.  Results  indicated  that  CS 
scores  were  significantly  correlated  with  future 
earnings  of  study  participants  both  by  them¬ 
selves  and  after  controlling  for  cognitive  ability 
(e.g.,  AFQT,  educational  attainment).  Segal 
postulated  that  CS  measures  an  underlying  in¬ 
trinsic  motivational  component  related  to  test¬ 
taking  performance  and  attainment  of  higher 
income  levels  over  time.  The  identihcation  and 
retention  of  individuals  likely  to  remain  moti¬ 
vation  over  time  is  of  particular  interest  to  the 
military. 

Assembling  Objects.  As  with  CS,  the  AO 
test  has  shown  small  (about  .02),  but  consistent, 
incremental  validity  when  used  in  combination 
with  other  ASVAB  tests.  The  .02  incremental 
validity  results  appear  robust  as  they  have  been 
observed  for  several  military  occupations  and 
performance  criteria  in  studies  conducted  by  the 
Army  (Anderson  et  ah,  2011;  Russell,  Le,  & 
Putka,  2007),  Marine  Corps  (Carey,  1994),  and 
Navy  (Held,  Fedak,  Crookenden,  &  Blanco, 
2002;  Held  et  ah,  2004). 

Table  3  shows  the  incremental  validity  of  AO 
for  predicting  hnal  school  grades  in  various 
ASVAB  composites  during  the  timeframe  that 
the  Navy  was  evaluating  both  CS  and  AO  for 
occupational  classihcation  (Held  et  ah,  2002). 
As  with  CS,  the  validities  in  Table  3  were 
corrected  for  range  restriction  on  the  ASVAB 
using  the  multivariate  method  (Lawley,  1943) 
but  were  not  corrected  for  criterion  reliability.  A 
bootstrap  method  was  used  to  reduce  the  influ¬ 
ence  of  outliers  (reporting  the  median  corrected 
validity  from  the  bootstrap  distribution). 

As  shown  in  Table  3,  AO  demonstrated  on 
average  about  a  .02  validity  increment  across 
the  group  of  Navy  ratings  when  compared  to  the 
non-AO  ASVAB  composite  that  was  deter¬ 
mined  to  have  highest  validity.  For  example,  for 
the  parachute  rigger  rating,  the  “best”  compos¬ 
ite  without  AO  (AR  -f  MK  -I-  El  -I-  GS)  had  a 
validity  of  .656.  Substituting  AO  for  GS  (AR  -I- 
MK  -f  El  -f  AO)  produced  a  .022  increment  in 
validity  (to  .678).  Eor  the  builder  rating,  the  best 
composite  without  AO  (AR  -I-  MC  +  AS)  had 
a  validity  of  .628.  Substituting  AO  for  AS 
(AR  -f  MC  +  AO)  yielded  a  .015  increment  in 
validity  (to  .643). 

There  are  several  points  to  be  made  in  the 
Navy  builder  example.  First,  the  AR  +  MC  + 
AS  composite  is  a  Navy  operational  classifica¬ 
tion  composite  (mechanical)  that  has  clear  over- 
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lap  in  constructs  by  using  both  MC  and  AS  (see 
Table  1  for  a  description  of  these  tests).  Replac¬ 
ing  AS  with  AO  not  only  reduces  that  construct 
overlap  but  improves  the  predictive  validity  of 
the  composite  and  reduces  adverse  impact  for 
some  groups.  Second,  the  builder  rating,  which 
includes  skilled  carpenters,  plasterers,  roofers, 
and  painters,  is  an  occupation  that  is  found  in 
the  other  services,  so  incorporating  the  AO  test 
in  the  other  services’  classihcation  composites 
should  yield  benehts  to  their  classihcation  sys¬ 
tems  as  it  has  for  the  Navy. 

Although  Table  3  provides  only  brief  de¬ 
scriptions  of  the  limited  number  and  types  of 
occupations  included  in  the  AO  validity  analy¬ 
ses,  the  major  duties  listed  show  the  relevance 
of  spatial  ability,  in  particular  of  the  type  mea¬ 
sured  by  the  AO  items  (form  construction  from 
pieces  and  connection  point  locations  for  form 
pieces).  Similar  AO  validity  results  were  ob¬ 
served  in  a  study  of  Navy  aviation  mechanics 
ratings  (Held  et  ah,  2004),  where  the  criterion 
again  was  hnal  school  grade.  This  hnal  grade, 
however,  was  considered  more  representative  of 
the  job  tasks  as  it  incorporated  hands-on  labo¬ 
ratory  performance  measures  that  were  scored 
on  a  continuous  scale. 

As  mentioned,  similar  incremental  validities 
for  composites  using  the  AO  test  have  been 
reported  by  the  Army  (Anderson  et  ah,  2011; 
Russell  et  ah,  2007)  and  Marine  Corps  (Carey, 
1994).  These  studies  included  a  variety  of  oc¬ 
cupations  (e.g..  Army;  infantryman,  armor 
crewman,  military  police,  light  wheel  vehicle 
mechanic,  health  care  specialist,  and  motor 
transport  operator;  Marine  Corps;  automotive 
and  helicopter  mechanics)  and  several  job  per¬ 
formance  criteria  including  measures  of 
hands-on  performance,  job  knowledge,  and  h- 
nal  course  grades. 

The  AO  test  was  also  a  part  of  the  more 
recent  Army  Select21  project  that  had  the  main 
objective  of  helping  to  ensure  the  acquisition  of 
soldiers  with  the  knowledge,  skills,  and  abilities 
needed  to  perform  the  types  of  tasks  envisioned 
in  a  transformed  Army  involved  with  future 
combat  systems.  The  Select21  project  included 
a  future-oriented  job  analysis  to  support  the 
development  of  experimental  selection  and 
classihcation  predictor  measures  and  perfor¬ 
mance  criteria  (Knapp  &  Tremble,  2007) 
mapped  to  a  different  mix  of  knowledge,  skills, 
and  abilities  resulting  from  projected  changes  in 


the  Army  force  structure  and  job  requirements. 
In  this  regard,  Russell  et  al.  (2007)  evaluated  the 
predictive  utility  of  the  AFQT,  a  unit-weighted 
ASVAB  Technical  composite  (AS,  MC,  and 
El),  and  the  AO  test  in  the  Select21  predictive 
validation  study.  The  sample  size  varied  by 
analysis  and  consisted  of  414-739  hrst-term 
enlisted  soldiers.  After  correction  for  range  re¬ 
striction  on  the  AFQT  and  criterion  unreliabil¬ 
ity,  the  validities  for  predicting  general  techni¬ 
cal  prohciency  were  AFQT  (.52),  Technical 
(.48),  and  AO  (.38).  When  all  three  scores  were 
used  together,  AO  provided  about  a  .03  incre¬ 
ment  in  validity  over  the  AFQT  and  Technical 
combined  scores  (.57  vs.  .54).  Noting  that  the 
AO  test  demonstrated  incremental  validity  be¬ 
yond  the  AFQT  and  Technical  scores  used  to¬ 
gether,  Russell  et  al.  stated  that  “Spatial  [AO] 
could  be  a  useful  predictor  beyond  the  ASVAB, 
not  just  beyond  AFQT”  (p.  68). 

We  recognize  that  additional  research  is 
needed  to  examine  the  effect  of  implementing 
multiple  nonverbal  reasoning  tests  in  military 
classihcation  systems  that  are  evaluated  in  con¬ 
cert  with  the  CS  and  AO  tests,  as  well  as  the 
existing  ASVAB  tests. 

Gender/Minority  Group  Score  Differences 

In  addition  to  demonstrating  predictive  valid¬ 
ity  and  incremental  validity,  one  of  the  criteria 
regarding  the  development  and  use  of  tests  for 
personnel  measurement  and  selection/classih- 
cation  is  that  they  demonstrate  the  same  rela¬ 
tions  to  occupational  criteria  for  majority  and 
minority  groups  (lack  of  predictive  bias)  and 
that  group  mean  differences  are  minimized 
(lack  of  adverse  impact).  One  of  the  arguments 
for  adding  tests  to  the  ASVAB  that  do  not  rely 
on  learned  content  is  to  reduce  mean  score 
differences  between  majority  and  minority 
groups  on  service  classihcation  composites 
(Drasgow  et  al.,  2006;  Wise  et  al.,  1992).  Wise 
and  colleagues  examined  the  sensitivity  and 
fairness  of  service  ASVAB  classihcation  com¬ 
posites,  specihcally  those  containing  the  techni¬ 
cal  tests,  for  many  Air  Force,  Army,  Marine 
Corps,  and  Navy  technical  occupations.  They 
observed  that  the  ASVAB  technical  composites 
were  generally  equally  fair  when  comparing 
regression  slopes  and  the  resulting  increases  in 
mean  criterion  scores  associated  with  increases 
in  predictor  scores  across  gender  and  racial 
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groups.  However,  adverse  impact  was  noted  to 
some  extent  for  the  minority  group.  As  a  result. 
Wise  and  colleagues  recommended  the  services 
consider  adding  valid  tests  to  the  ASVAB  (or  to 
their  classihcation  systems)  that  reduced  or 
eliminated  barriers  to  occupational  assignments. 
In  response,  the  Navy  adopted  the  AO  test,  for 
which  mean  score  differences  between  the  ma¬ 
jority  group  (males  and  Whites)  and  each  of 
several  minority  groups  (females,  racial/ethnic 
minority  groups)  were  smaller  compared  to  dif¬ 
ferences  observed  on  the  ASVAB  technical 
tests.  This  section  does  not  address  regression 
slope  differences  (bias  or  fairness)  between 
groups,  only  adverse  impact  dehned  as  group 
mean  differences  in  ASVAB,  AO,  and  CS  tests. 

Table  4  (from  Held  et  ah,  2002)  provides  a 
gender  and  race/ethnic  group  breakout  of  mean 
score  differences  for  the  ASVAB  tests,  AO,  and 
CS  calculated  as  effect  sizes  for  Navy  acces¬ 
sions"*  before  CS  was  eliminated  from  the  bat¬ 
tery  (and  during  the  AO  evaluation  phase). 

Table  4  also  shows  the  mean  differences  be¬ 
tween  Whites  and  racial/ethnic  minority  (Afri¬ 
can  American,  Hispanic,  Asian,  and  Native 
American)  groups  broken  out  by  gender  ex¬ 
pressed  as  effect  sizes.  Effect  sizes  were  calcu¬ 
lated  as  the  difference  between  the  majority 
(White)  and  the  specihc  minority  group  mean 
divided  by  the  pooled  group  standard  deviation 
(SD).  Cohen  (1988)  characterizes  standardized 
mean  score  differences  of  .2  as  small,  .5  as 
moderate,  and  .8  as  large.  For  this  study,  an 
effect  size  equal  to  or  greater  than  1.51  was 
considered  a  meaningful  impact. 

The  same  test  effect  size  patterns  were  ob¬ 
served  for  males  and  females  across  the  race/ 
ethnic  groups  (White  being  the  common  com¬ 
parison  group),  suggesting  cultural  differences. 
African  Americans  had  the  largest  number  of 
effect  size  differences  across  tests  and  gender 
followed  by  Hispanics,  and  Asians.  No  mean¬ 
ingful  effect  size  differences  were  found  for 
Native  Americans  for  either  males  or  females. 
Not  considering  Native  Americans  further,  au¬ 
to/shop  (AS)  had  the  largest  effect  size,  favor¬ 
ing  Whites  (males  and  females),  for  the  three 
majority  and  minority  group  comparisons 
(White  vs.  African  Americans,  Hispanics,  and 
Asians).  The  effect  size  difference  (favoring 
Whites)  was  largest  for  African  Americans 
(1.13  for  males  and  1.09  for  females).  In  con¬ 
trast,  AO,  when  compared  to  AS,  had  trivial 


effect  sizes  with  the  exception  of  African  Amer¬ 
icans,  where  the  effect  size  was  .58  for  both 
males  and  females.  Both  CS  and  AO  had 
smaller  effect  sizes  than  any  of  the  technical 
knowledge  tests.  The  effect  size  for  CS  was 
trivial  across  all  groups  and  gender  with  the 
exception  of  a  small  .21  effect  size  in  the  com¬ 
parison  of  White  and  African  American  males 
(favoring  Whites). 

Figure  1  graphically  shows  the  effect  sizes 
for  the  ASVAB  and  CS  tests  for  gender.  It  is  a 
graphical  representation  of  the  Table  4  data 
collapsed  across  all  groups  (males,  N  =  35,831; 
females,  N  =  8,246). 

As  shown  in  Figure  1,  CS  was  the  only  test  to 
favor  females,  nearly  reaching  the  1.51  effect  size 
criterion.  This  outcome  is  consistent  with  pre¬ 
vious  research  showing  that  females  outperform 
males  on  clerical  speed/accuracy  tests  (Majeres, 
1988)  and  processing  speed  tasks  involving  dig¬ 
its  and  letters  (Roivainen,  2011).  We  note  that 
this  female  advantage  has  not  been  found  to 
extend  to  reaction  time  tasks  where  males  have 
been  shown  to  outperform  females  (Roivainen, 
2011). 

The  small  mean  score  differences  for  men 
and  women  observed  for  the  CS  and  AO  tests 
compared  to  those  for  the  technical  knowledge 
tests  (GS,  AS,  MC,  or  FI)  enables  more  women 
to  qualify  for  a  broad  range  of  jobs  with  no  loss 
of  predictive  validity  or  classihcation  effective¬ 
ness.  Although  the  AO  test  seems  appropriate 
for  many  mechanical  occupations  as  a  substitute 
for  AS,  the  AO  test  as  yet  has  not  been  fully 
evaluated  across  all  types  of  Navy  occupations, 
but  will  be  in  the  near  future.  The  Air  Force  is 
in  the  process  of  evaluating  the  AO  tests  in  a 
broader  array  of  occupations. 

We  note  that  a  case  is  not  being  made  to 
eliminate  the  ASVAB  technical  tests  (GS,  AS, 
MC,  and  El)  but  that  it  is  possible  to  provide 
alternative  ASVAB  standards  with  low  adverse 
impact  tests  (as  the  Navy  has  done  with  both 
AO  and  CS)  that  meet  or  exceed  the  validity  of 
the  technically  saturated  ASVAB  composites. 
The  technical  knowledge  tests  have  high  utility 
in  military  classihcation  because  they  measure 


'*  The  Held  et  al.  (2002)  data  are  for  Navy  accessions,  not 
applicants.  It  is  likely  that  group  effect  sizes  for  accessions 
are  smaller  than  those  for  applicants  due  to  selection  on  the 
ASVAB. 
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Table  4 

Effect  Size  Analysis  for  Gender  and  Race/Ethnic  Groups  (EY99  Navy  Accession  Population) 


Male  effect  sizes 

Whites  (N  = 

22,230) 

Female  effect  sizes 

Whites  {N  = 

4,454) 

Af.  Am. 

Hisp. 

Asian 

Nat.  Am. 

Af.  Am. 

Hisp. 

Asian 

Nat.  Am. 

ASVAB 

N  =  6,117 

N  =  4,049 

N  =  \  J11  N  =  1,523 

N  =  1,911 

N  =  1005 

N  =  383 

N  =  410 

GS 

*0.93 

*0.68 

*0.78 

0.03 

*0.87 

*0.68 

*0.53 

0.16 

AR 

*0.70 

0.31 

0.09 

0.03 

*0.62 

0.29 

-0.07 

0.10 

VE 

*0.65 

*0.59 

*0.73 

-0.01 

*0.66 

*0.57 

0.45 

0.12 

MK 

0.19 

0.04 

-0.42 

0.05 

0.11 

0.02 

-0.41 

0.06 

MC 

*0.93 

0.43 

0.43 

-0.01 

*0.83 

0.42 

0.34 

-0.03 

AS 

*1.13 

*0.73 

*1.04 

-0.11 

*1.09 

*0.84 

*0.94 

0.01 

El 

*0.76 

*0.52 

0.46 

-0.01 

*0.68 

*0.61 

0.39 

0.14 

AO 

*0.58 

0.18 

-0.04 

-0.05 

*0.58 

0.22 

0.02 

-0.03 

CS 

0.21 

0.10 

-0.08 

0.06 

0.17 

0.18 

-0.10 

0.07 

Note.  *  Denotes  an  effect  size  greater  then  1.51  (half  a  standard  deviation),  .5  being  considered  moderate.  Effect 
size  was  calculated  as  the  major  group  mean  (White)  minus  the  minor  group  mean,  the  difference  divided  by 
the  pooled  groups’  standard  deviation.  ASVAB  =  Armed  Services  Vocational  Aptitude  Battery;  Af.  Am.  = 
African  American;  Hisp.  =  Hispanic;  Nat.  Am.  =  Native  American;  GS  =  General  Science;  AR  =  Arithmetic 
Reasoning;  VE  =  Verbal;  MK  =  Mathematics  Knowledge;  MC  =  Mechanical  Comprehension;  AS  =  Auto 
and  Shop  Information;  El  =  Electronics  Information;  AO  =  Assembling  Objects;  CS  =  Coding  Speed. 


not  only  knowledge  of  the  subject  matter  rele¬ 
vant  to  training  and  jobs,  but  potentially  expe¬ 
rience  and  interest  that  results  in  motivated  en¬ 
gagement  in  technical  endeavors,  which 
involves/enhances  the  learning  process. 

Improved  Classification 

Improved  classihcation  was  considered  with 
respect  to  two  objectives:  (a)  increasing  assign¬ 
ment  flexibility  and  (b)  improved  performance. 

Increased  assignment  flexibility.  During 
the  time  that  CS  was  being  evaluated  for  elim¬ 
ination  from  the  ASVAB,  the  Navy  was  con- 


Figure  1.  Fiscal  Year  1999  Navy  Armed  Services  Voca¬ 
tional  Aptitude  Battery  effect  sizes  for  gender.  See  the 
online  article  for  the  color  version  of  this  figure. 


cemed  not  only  about  lower  predictive  validity 
from  losing  CS  and  increased  adverse  impact, 
but  also  about  losing  differential  assignment 
capability  (Johnson,  &  Zeidner,  1991).  Schol- 
arios  et  al.  (1994)  showed  that  CS  provided 
differential  assignment  capability  as  well  as  in¬ 
creased  mean  predicted  performance  (Brogden, 
1951).  Without  the  use  of  CS,  the  Navy  was 
concerned  that  assignment  flexibility  would  be 
restricted,  resulting  in  an  increased  number  of 
applicants  who  could  not  be  assigned  to  jobs. 
The  Navy’s  evaluation  of  differential  assign¬ 
ment  capability,  described  more  fully  later  in 
this  paper,  took  the  form  of  simulating  recruit 
assignments  to  Navy  ratings  using  and  not  using 
CS  and  AO  in  ASVAB  classification  compos¬ 
ites.  The  objective  was  to  see  how  many  recruits 
would  not  be  assigned  across  all  ratings  given 
their  yearly  goals  (school  seats)  under  varying 
ASVAB,  CS,  and  AO  classification  scenarios. 
In  all  scenarios,  the  cut  scores  established  for 
the  composites  that  used  CS  or  AO  and  those 
that  did  not  were  effectively  set  to  be  the  same 
for  each  Navy  rating  (recognizing  that  to 
achieve  a  better  All  rate  across  ratings  all  one 
would  need  to  do  is  to  lower  the  cut  scores,  but 
with  the  expectation  of  lower  performance). 

Two  classification  algorithms  were  used  in 
the  Navy’s  recruit  assignments  to  ratings  simu¬ 
lation  studies.  The  first  was  developed  by  Navy 
Personnel  Research,  Studies,  and  Technology 
(Folchi,  2007;  Folchi  &  Watson,  1997)  and 
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operationalized  by  EDS  Federal  Engineering 
and  Logistics  under  contract  (EDS  Federal, 
2001).  The  algorithm  is  now  incorporated  in  the 
Navy’s  operational  rating  classihcation  system 
called  the  Rating  Identihcation  Engine  (Crook- 
enden  &  Blanco,  2002).  The  algorithm’s  pur¬ 
pose  is  to  generate  a  ranking  of  ratings  (occu¬ 
pations  or  alternatively  specihc  jobs)  to  which  a 
person  should  be  classihed  considering  input 
personnel  data  and  two  utility  functions.  One 
utility  function  rewards  applicants  with  high 
ASVAB  composite  scores  relevant  to  a  partic¬ 
ular  rating.  The  other  function  discourages  clas¬ 
sifying  largely  overqualihed  applicants  to  rat¬ 
ings  considered  not  optimally  challenging.  Data 
for  recruits  are  entered  into  the  classihcation 
system  in  a  sequential  manner  (which  involves 
a  random  selection  component)  in  order  to 
mimic  the  operational  assignment  process  (in 
contrast  to  assigning  all  recruits  in  a  batch). 

Four  composite  sets  were  created  for  the  sim¬ 
ulation.  Composite  Set  1  (baseline)  contained 
no  ASVAB  composites  with  either  CS  or  AO. 
Composite  Set  2  contained  some  composites 
with  AO  where  predictive  validity  warranted 
the  test’s  use.  Likewise,  Composite  Set  3  con¬ 
tained  some  composites  with  CS  where  the  va¬ 
lidity  warranted  the  test’s  use.  Composite  Set  4 
contained  all  of  the  CS  and  AO  composites  for 
their  respective  ratings  along  with  the  remain¬ 
ing  ASVAB  composites  that  applied  to  ratings 
where  the  CS  and  AO  tests  did  not  add  incre¬ 


mental  validity.  Only  one  composite  per  rating 
applied  in  all  simulation  scenarios  and  cut 
scores  were  set  (on  a  rating’s  counterpart  com¬ 
posite  across  scenarios)  to  qualify  the  same 
percentage  of  recruits.  Differential  assignment 
capability  was  dehned  as  (a)  the  increase  in  the 
percentage  of  the  recruit  population  “assigned” 
to  ratings  and  (b)  the  lowest  standard  deviation 
of  hll  rate  (indicating  even  hll). 

Over  80  Navy  ratings  with  their  associated 
recruitment  goals  for  4  or  5  year  enlistment 
training  programs  were  involved  in  the  study. 
All  were  referred  to  in  the  simulation  as  “jobs.” 
Males  and  females  were  simulated  in  separate 
analyses  because  some  jobs  were  not  open  to 
females.  Finally,  four  scenarios  were  applied  for 
the  four  sets  of  composites  that  varied  the  ratio 
of  “job  slots”  to  recruits  to  mimic  different 
recruiting  environments  (e.g.,  either  too  many 
or  not  enough  recruits  for  slots).  Table  5  shows 
the  results  of  the  simulations. 

Table  5  shows  that,  in  each  of  the  four  clas¬ 
sihcation  simulation  scenarios,  providing  an 
ASVAB  composite  set  that  included  some  com¬ 
posites  with  the  AO  or  CS  tests,  and  especially 
with  both,  resulted  in  fewer  recruits  “unas¬ 
signed”  to  jobs.  Also,  the  standard  deviation  of 
job  hll  (indicating  evenness  of  distribution) 
tended  to  decrease  with  the  addition  of  compos¬ 
ites  with  the  AO  and  CS  tests.  The  obvious 
exception  is  for  scenario  4  (13.4%  more  male 
jobs  than  males  to  assign)  where  everyone  was 


Table  5 

Rating  Classification  Simulation  Results 


Composite  set  without  Composite  set  with  Composite  set  with  Composite  set  with  AO  and 
AO  or  CS  AO  CS  CS 


Scenaiio  1:  1.7%  less  female  jobs  than  females  (8,134  jobs;  8,275  females) 

Unassigned  recruits  469  413  389  288-303  (range  with  4  runs) 

Job  fill  SD  16.1%  15.1%  14.6%  13.7-14.8% 


Scenario  2:  2.5%  more  female  jobs  than  females  (8,484  jobs;  8,275  females) 

Unassigned  recruits  501  440  279  279-300  (range  with  4  runs) 

Job  fill  SD  20.2%  18.7%  16.7%  16.4-17.4  % 


Scenario  3;  6.2%  more  male  jobs  than  males  (38,402  jobs;  36,154  males) 

Unassigned  recruits  938  661  785  492-555  (range  with  4  runs) 

Job  fill  SD  13.8%  13.4%  13.0%  12.6-14.4  % 


Scenario  4:  13.4%  more  male  jobs  than  males  (40,995  jobs;  36,154  males) 

Unassigned  recruits  387  71  213  0  (range  with  4  runs) 

Job  fill  SD  15.9%  15.6%  15.6%  18.2-19.3  % 


Note.  AO  =  Assembling  Objects;  CS  =  Coding  Speed;  SD  =  standard  deviation. 
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assigned  to  a  job  using  the  AO  and  CS  compos¬ 
ite  set,  but  with  less  of  an  even  hll  across  ratings 
(standard  deviations  ranged  from  18.2%  to 
19.4%  for  4  runs,  as  compared  to  the  high  of 
about  15%  for  the  other  runs). 

The  other  Navy  sequential  assignment  simu¬ 
lation  application,  developed  by  the  Lewin 
Group,  Inc.  (Hogan  &  Simonson,  2004)  is  used 
as  a  decision  tool  when  conducting  Navy 
ASVAB  validation/standards  studies.  The  algo¬ 
rithm  assigns  a  recruit  (drawn  randomly  from  a 
prior  year’s  recruit  population  that  is  an  input 
hie  to  the  application)  to  a  rating  (with  annual 
goals  or  slot  numbers  for  each  rating  provided 
in  a  separate  input  hie)  for  which  the  difference 
between  that  recruit’s  ASVAB  composite  score 
is  a  minimum  when  compared  to  the  rating’s 
operational  composite’s  cut  score.  Despite  this 
minimum  ASVAB  delta  criterion  for  a  rating 
assignment  (which  involves  a  random  tie 
breaker  routine  when  ties  occur),  all  ratings  end 
up  with  a  fairly  wide  range  (distribution)  of 
ASVAB  scores  to  the  right  of  the  cut  score 
because  there  are  a  limited  number  of  recruits 
with  ASVAB  composite  scores  at  the  margin. 

Consistent  with  the  operational  Rating  Iden- 
tihcation  Engine  algorithm  simulation  study, 
the  Lewin  Group,  Inc.  (Hogan  &  Simonson, 
2004)  study  also  showed  more  recruits  in  the 
aggregate  assigned  to  ratings  when  the  CS  and 
AO  composites  were  used  compared  to  when 
they  were  not.^  Not  only  was  differential  as¬ 
signment  improved  (i.e.,  more  recruits  being 
assigned  to  ratings  at  the  same  relative  cut 
scores),  but  at  lower  recruiting  costs.  The  lower 
recruiting  costs  were  due  to  a  lower  proportion 
of  high  AFQT  recruits  with  a  high  school  di¬ 
ploma  required  to  hll  the  ratings  due  to  the 
lower  correlation  of  CS  and  AO  to  the  AFQT 
tests  used  operationally  for  military  selection. 
Traditionally,  the  higher  AFQT  youth  with  a 
high  school  diploma  are  more  expensive  to  re¬ 
cruit. 

Improved  performance.  Aside  from  im¬ 
proved  hexibility,  another  person-job-match 
goal  considered  was  improved  performance. 
Horst  (1954,  1955,  1956)  and  Brogden  (1946, 
1951,  1955,  1959)  were  early  pioneers  in  rec¬ 
ognizing  that  an  important  outcome  of  im¬ 
proved  classihcation,  termed  CE,  could  be  mea¬ 
sured  in  terms  of  the  overall  performance  of 
those  classihed  into  jobs.  The  Army  has  been 
the  main  driver  among  the  services  in  research 


showing  improvements  in  CE  to  optimize  per¬ 
son-job  match  under  the  framework  of  DAT 
that  considers  many  aspects  of  classihcation 
effectiveness  (Johnson  &  Zeidner,  1995).  In  a 
study  involving  many  cognitive  and  noncogni- 
tive  measures  from  the  Army’s  Project  A, 
Scholarios  et  al.  (1994)  showed  that  the  inclu¬ 
sion  of  CS  among  the  predictors  increased  mean 
predicted  performance  (MPP)  across  a  broad  set 
of  occupations  and  improved  differential  as¬ 
signment  capability.  In  one  experimental  test 
battery  that  involved  the  largest  number  of  oc¬ 
cupations,  CS,  in  an  optimally  derived  equation, 
was  selected  hrst  based  upon  its  differential 
assignment  index.  Although  the  Navy  did  not 
use  the  same  statistical  methods  for  assessing 
CS  differential  assignment  capability  as  did 
Scholarios  et  al.  (1994),  their  recruit  occupa¬ 
tional  assignment  simulations  demonstrated  that 
lowering  the  average  intercorrelation  of  the 
ASVAB  tests  (which  inclusion  of  CS  does)  will 
improve  the  breadth  of  coverage  of  the  cogni¬ 
tive  domain. 

Even  from  an  economist’s  viewpoint,  there  is 
recognized  utility  in  increasing  the  differential 
classihcation  capability  of  the  performance  pre¬ 
dictor  variables  when  considering  assignment 
algorithms.  For  example,  Schmitz  and  Holz 
(1987)  were  keenly  aware  of  the  differences  in 
selection  and  classihcation  in  their  personnel 
assignment  models: 

Selection  focuses  on  the  differences  among  individu¬ 
als,  generally  using  a  single  scale  of  value  or  utility. 
Applicants  are  classified  into  two  categories:  those 
satisfactoiy  for  employment  and  those  not  .... 

Differential  classification  deals  with  differences  within 
an  individual  with  respect  to  various  skills.  A  particu- 
lai‘  individual  may  have  a  high  aptitude  for  mathemat¬ 
ics  but  poor  writing  skills.  Another  may  have  consid¬ 
erable  talent  for  electronics  jobs  but  poor 
communication  ability.  Classification  requires  the  use 
of  two  or  more  different  performance  predictors,  (p. 
440) 

Johnson  and  Zeidner  (1991,  1995)  developed 
the  principles  of  DAT  that  considered  the  tenets 
of  Brogden’ s  (1959)  measure  of  CE  but  also  the 
formation  of  more  or  less  homogeneous  occu- 


^  The  Lewin  Group,  Inc.,  application  is  used  for  many 
purposes,  including  comparing  diversity  across  ratings  to 
the  operational  classification  state  and  also  to  assess  the  fill 
potential  for  women  in  technical  ratings  (females  histori¬ 
cally  score  lower  on  the  ASVAB  technical  tests). 
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pational  groups  upon  which  to  develop  predic¬ 
tion  equations  that  maximize  MPP  in  the  aggre¬ 
gate.  CE  equates  to  MPP  and  captures  both 
predictive  validity  and  the  intercorrelation  of 
the  ordinary  least  squares  (OLS)  equation  esti¬ 
mates  of  performance  (job  or  training).  The  CE 
formulas  that  are  outgrowths  of  the  early  Army 
pioneers’  work  are  presented  succinctly  by  Stat- 
man,  Gribben,  Naughton,  and  McCloy  (1998,  p. 
7),  as  follows,  where  MPP  equates  to  CE: 

MPP  =  R(1 -r)l/2Zm  (1) 

where 

MPP  =  mean  predicted  performance  standard  score  of 
a  group  of  applicants  assigned  to  m  jobs, 

R  =  average  predictive  validity  of  OLS  estimates  for 
all  jobs, 

r  =  average  intercorrelation  of  the  OLS  estimates,  and 

Zm  =  mean  criterion  standard  score  of  the  group  after 
assignment  to  the  m  jobs  with  equal  vacancies  (called 
quotas). 

Statman  et  al.  (1998,  p.  7)  noted  that  R  (pre¬ 
dictive  validity)  is  positively  related  to  CE. 
However,  to  maximize  the  CE  index  of  MPP, 
the  intercorrelations  within  the  prediction  equa¬ 
tions  across  occupations  need  to  be  lowered. 
Both  CS  and  AO  help  to  lower  the  average 
ASVAB  test  intercorrelations  and  so  also  the 
intercorrelations  of  OLS  prediction  equations 
that  contain  them.  Eurther  applied  research  us¬ 
ing  the  principles  developed  by  the  classifica¬ 
tion  theorists  could  be  applied  to  the  ASVAB 
that  includes  AO,  CS,  and  all  existing  and  new 
measures  potentially  appropriate  for  the  AS¬ 
VAB  or  as  adjuncts.  This  work  could  be  done 
considering  all  of  the  performance  predictors  in 
concert,  not  in  the  piecemeal  fashion  that  the 
Navy  first  took  in  their  evaluation  of  the  benefits 
of  the  AO  and  CS  tests. 

Discussion 

Drasgow  et  al.  (2006)  recommended  that 
cognitive  content  be  added  to  the  ASVAB  that 
was  not  dependent  on  verbal  skills  or  acquired 
knowledge.  To  this  end,  we  examined  both  the¬ 
oretical  and  empirical  support  for  two  tests,  CS 
(speed/accuracy)  and  AO  (spatial)  for  use  in 
military  occupational  classification.  We  focused 
on  three  factors  of  particular  importance  to  the 


U.S.  military:  (a)  incremental  predictive  valid¬ 
ity  when  these  tests  are  used  in  combination 
with  the  ASVAB  academic  and  technical 
knowledge  tests  for  predicting  important  occu¬ 
pational  criteria,  (b)  reducing  subgroup  differ¬ 
ences  (adverse  impact)  for  women  and  some 
minority  groups,  and  (c)  improved  classifica¬ 
tion,  both  in  terms  of  increased  assignment  flex¬ 
ibility  and  improved  performance,  due  to  en¬ 
hanced  differential  assignment  capability. 

Our  comprehensive  evaluation  approach  was 
consistent  with  the  recommendation  from  Dras¬ 
gow  et  al.  (2006,  p.  25)  that  evaluations  of  the 
benefit  of  adding  a  test  to  the  ASVAB  go  be¬ 
yond  examination  of  incremental  validity  to  in¬ 
clude  reduced  adverse  impact  and  improved 
classification  efficiency.  Our  analyses  indicated 
that,  in  aggregate,  inclusion  of  AO  and  CS  in 
Navy  classification  composites  provided  small 
increments  in  predictive  validity,  reduced  ad¬ 
verse  impact  for  women  and  some  minority 
groups,  and  improved  classification  in  terms  of 
increased  assignment  flexibility  and,  theoreti¬ 
cally  according  to  formal  CE  methods,  im¬ 
proved  mean  performance. 

Incremental  Validity 

Despite  the  AO  and  CS  tests  being  based  on 
different  concepts  of  cognitive  ability  and  con¬ 
tent  domains  than  the  ASVAB  verbal,  math,  and 
technical  knowledge  tests,  only  small  incre¬ 
ments  in  predictive  validity  (about  .02  on  aver¬ 
age)  were  observed  in  the  studies  reported  in 
this  paper.  The  overall  mean  incremental  pre¬ 
dictive  validity  for  CS  and  AO  was  close  to  that 
estimated  by  Schmidt  et  al.  (1995),  who  ac¬ 
knowledged  that  although  the  .02  validity  incre¬ 
ment  seems  small,  it  has  the  potential  to  pro¬ 
duce  substantial  cost  savings  in  large  scale 
testing  programs  such  as  the  ASVAB  (used  for 
military  occupational  classification). 

Given  the  theoretical  and  empirical  evidence 
supporting  the  use  of  the  AO  and  CS  tests,  it  is 
instructive  to  discuss  the  reason  for  the  ob¬ 
served  small  .02  increment  in  validity  these 
tests  contribute  to  the  ASVAB  verbal,  math,  and 
technical  knowledge  tests.  To  some  extent,  the 
small  incremental  predictive  validity  for  CS  and 
AO  is  simply  that  all  cognitive  tests  measure  to 
varying  extents  general  mental  ability.  Confir¬ 
matory  factor  analyses  have  shown  that  both  CS 
(Ree  &  Carretta,  1994)  and  AO  (Drasgow  et  al.. 
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2006)  contribute  strongly  to  a  general  factor 
derived  from  all  of  the  ASVAB  tests  and  inter¬ 
preted  as  psychometric  g.  These  results  are  con¬ 
sistent  with  studies  that  have  compared  cogni¬ 
tive  tests  developed  from  different  theoretical 
approaches  only  to  find  they  mainly  measured  g 
(e.g.,  Keith,  Kranzler,  &  Flanagan,  2001; 
Stauffer,  Ree,  &  Carretta,  1996).  One  explana¬ 
tion  for  the  high  degree  of  overlap  in  what  is 
being  measured  by  tests  developed  on  different 
theoretical  bases  (such  as  Gc  and  Gf)  and  the 
observed  low  incremental  validity  for  CS  and 
AO  in  the  studies  reviewed  in  this  paper  can  be 
found  in  what  Spearman  (1923)  called  the  “in¬ 
difference  of  the  indicator”  or  “indifference  of 
the  fundaments.”  This  means  that  test  content  is 
not  fundamental  to  the  measurement  of  g.  The 
test  content  is  the  vehicle  that  allows  for  the 
expression  of  relationships  and  differences 
measured  in  cognitive  ability  tests.  Cognitive 
ability  can  and  has  been  measured  with  verbal, 
quantitative,  and  spatial  test  items  of  widely 
varying  content.  Measures  of  nonverbal  reason¬ 
ing  (e.g..  Raven’s  Matrices;  Raven,  1939)  have 
no  verbal  or  quantitative  content,  and  little  spa¬ 
tial  content  yet  measures  g.  Chronometric  mea¬ 
sures  (i.e.,  cognitive  speed)  that  require  no  ver¬ 
bal,  quantitative,  or  spatial  content  also  have 
been  shown  to  measure  g  (Jensen,  2006). 

Despite  the  viewpoint  of  a  predominant  g 
factor,  there  is  utility  to  tests  like  CS  and  AO  in 
the  large-scale  ASVAB  testing  program  be¬ 
cause,  despite  these  tests’  low  incremental  va¬ 
lidity,  they  apply  to  many  occupations  and  over 
time  many  individuals  will  be  better  fit  to  oc¬ 
cupations  with  the  expectation  of  better  training 
and  by  extension,  job  performance.  Also,  use  of 
the  CS  and  AO  tests  unquestionably  lowers 
adverse  impact  or  test  score  barriers. 

Adverse  Impact 

When  evaluating  new  predictors,  the  primary 
focus  of  much  of  the  research  has  been  on  their 
incremental  validity  over  existing  predictors 
(e.g.,  Besetsny,  Earles,  &  Ree,  1993;  Wolfe, 
1997;  Wolfe  et  al.,  1997).  While  incremental 
validity  is  desirable  when  adding  test  content  to 
a  battery  such  as  the  ASVAB,  it  may  not  be 
necessary  if  the  new  measure  can  replace  an 
existing  test  with  no  loss  of  validity  but  provide 
some  other  benefit  such  as  demonstrating 
smaller  mean  subgroup  differences.  As  was 


seen,  smaller  mean  subgroup  differences  can 
expand  the  occupation-qualified  recruit  pool. 

The  most  significant  contributions  of  the  CS 
and  AO  tests  to  assignment  of  military  enlisted 
personnel  to  occupations  may  be  their  potential 
to  reduce  adverse  impact  for  females  and  some 
minority  groups.  The  potential  to  reduce  ad¬ 
verse  impact  is  due  to  less  reliance  on  learned 
content  than  is  the  case  for  the  ASVAB  verbal, 
math,  and  technical  knowledge  tests.  The  Army 
recognized  the  CS  test’s  benefit  in  this  regard 
during  an  exploration  of  methods  in  measuring 
ASVAB  composite  fairness  after  the  NO  and 
CS  tests  were  eliminated.  Based  on  their  eval¬ 
uation,  Zeidner,  Johnson,  Vladimisrsky,  and 
Weldon  (2004)  recommended  that  the  CS  test 
be  restored  to  the  ASVAB  to  reduce  adverse 
impact  and  increase  MPP. 

Improved  Classification  Outcomes 

The  potential  of  CS  and  AO  to  improve  clas¬ 
sification  flexibility  by  increasing  the  number  of 
occupations  recruits  can  qualify  for  without 
compromising  aggregate  expected  performance 
is  part  of  improving  CE  and  was  shown  in  this 
paper  through  two  Navy  rating  assignment  sim¬ 
ulation  studies.  Limited  work  thus  far  has  been 
done  by  the  Navy  to  address  the  formal  tenants 
and  formulas  the  Army  uses  to  assess  CE  within 
the  DAT  framework.  Applied  work  will  be 
planned  to  do  so  when  test  score  data  are  avail¬ 
able  for  all  of  the  tests  scheduled  for  evaluation 
as  formal  ASVAB  tests  or  as  ASVAB  adjuncts 
and  as  military  performance  data  become  avail¬ 
able  for  comprehensive  test  validation  studies. 
However,  the  work  done  by  the  Army  at  least 
supports  the  expectation  that  the  inclusion  of  CS 
would  enhance  the  outcome  of  mean  predicted 
performance. 

Given  the  various  Army’s  and  Navy’s  inde¬ 
pendent  assessments  of  the  CS  and  AO  tests  and 
the  recent  DMDC  finding  of  a  more  robust  CS 
test  than  previously  thought,  the  next  step  could 
be  the  services  participating  in  a  joint-service 
evaluation  of  the  two  tests  (on  the  three  criteria 
presented  in  this  paper).  Taken  together,  the 
empirical  support  for  the  CS  and  AO  tests  in¬ 
dicates  multiple  benefits  to  the  services  in  (a) 
increasing  their  ability  to  predict  which  recruits 
will  perform  well  in  training  and  thus  establish 
more  effective  cognitive  ability  cut  scores,  (b) 
lowering  score  barriers  for  women  and  some 
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minority  groups  in  qualifying  for  important  oc¬ 
cupations  and  thus  improve  the  diversity  that  is 
highly  valued  in  the  military,  and  (c)  increasing 
the  overall  percentage  of  recruits  qualified  for 
classification/assignment  to  occupations  with 
the  same  previously  expected  success  rates.  All 
of  the  services  currently  are  administered  the 
AO  test  in  the  ASVAB  Enlisted  Testing  Pro¬ 
gram  (both  CAT-ASVAB  and  paper-and-pencil 
ASVAB)  and  it  should  not  be  much  of  an  ad¬ 
ditional  testing  burden  to  administer  the  7-min 
CS  test  at  the  MBPS  to  all  service  applicants. 
The  only  additional  burden  would  be  to  ensure 
high  quality  levels  for  the  criterion  used  for  the 
validity/incremental  validity  and  CE  portions  of 
the  analysis. 

New  Test-Item  Development 

Another  advantage  of  CS  and  AO  that  has  not 
yet  been  discussed  and  that  applies  to  other 
measures  that  do  not  use  academic  or  knowl¬ 
edge-based  content  is  the  relative  ease  and 
lower  expense  of  new  test-item  development 
(compared  with  measures  of  learned  content). 
That  is,  tests  like  AO  and  CS  and  the  candidate 
ASVAB  measures  of  nonverbal  reasoning  and 
working  memory  can  be  developed  as  auto¬ 
mated  item-generated  tests.  Automated  item¬ 
generated  tests  have  the  potential  to  (a)  reduce 
item  writer  requirements  and  test  maintenance 
levels  compared  to  that  required  for  academic 
and  knowledge  based  tests,  (b)  alleviate  the 
need  to  continually  review  test  content  for  out¬ 
dated  material,  and  (c)  minimize  the  monitoring 
efforts  needed  to  identify  test  security  and  test 
compromise  issues.  Eurther,  tests  like  AO,  CS, 
and  measures  of  nonverbal  reasoning  and  work¬ 
ing  memory  can  be  made  adaptive  like  the  AS¬ 
VAB  tests,  therefore  reducing  testing  time. 
Conversion  to  an  adaptive  format  could  also 
improve  measurement  precision  but  requires  a 
psychometric  balancing  act. 

Limitations  and  Future  Studies 

There  are  limitations  to  the  analyses  reported 
in  this  paper  supporting  the  CS  and  AO  tests 
that  include  (a)  some  occupations  having  small 
sample  sizes,  (b)  not  having  a  large  variety  of 
very  different  types  of  occupations,  (c)  not  eval¬ 
uating  the  inclusion  of  both  AO  and  CS  in  the 
same  classification  composites,  and  (d)  not  hav¬ 


ing  all  potential  or  adjunct  ASVAB  tests  to 
compare  outcomes. 

As  noted  earlier,  DMDC  and  the  services 
plan  to  evaluate  new  measures  of  nonverbal 
reasoning  and  working  memory  for  potential 
inclusion  on  the  ASVAB  or  as  special  tests 
(adjuncts  to  the  ASVAB).  These  tests  include 
Table  Processing  (Segall,  2010),  a  purer  mea¬ 
sure  of  processing  speed  than  CS  and  a  possible 
replacement  or  addition,  and  an  updated  version 
of  the  ECAT  Mental  Counters  (MCt)  test, 
which  measures  working  memory.  The  ECAT 
MCt  test  demonstrated  incremental  validity  for 
predicting  military  air  traffic  controller  perfor¬ 
mance  (Held  &  Wolfe,  1997)  and  is  expected  to 
contribute  to  an  operational  Navy  ASVAB 
composite  in  2015.  MCt  also  is  expected  to  be 
useful  for  other  military  occupations  for  which 
working  memory  has  been  identified  as  impor¬ 
tant  (e.g.,  unmanned  aerial  vehicle  operators; 
Paullin,  Ingerick,  Trippe,  &  Wasko,  201 1).  Eur- 
ther,  MCt  has  been  included  in  the  new  version 
of  the  Defense  Language  Aptitude  Battery, 
which  will  be  further  evaluated  for  predicting 
foreign  language  learning  outcomes  in  a  valid¬ 
ity  confirmation  study. 

It  may  be  that  recent  changes  in  technology 
and  job  requirements  in  the  military  (e.g.,  wider 
application  of  unmanned  aerial  vehicle  plat¬ 
forms  and  cyber/IT)  may  show  even  more  rel¬ 
evance  for  CS,  AO,  and  the  other  tests  now 
being  evaluated  for  military  occupational  clas¬ 
sification.  The  new  performance  predictor  tests 
at  the  very  least  are  expected  to  have  benefits 
similar  to  those  demonstrated  for  CS  and  AO  in 
this  paper.  That  is,  the  nonverbal  reasoning  and 
working  memory  tests  are  expected  to  show  at 
least  small  amounts  of  incremental  validity  to 
current  ASVAB  tests,  reduce  adverse  impact  for 
females  and  some  minority  groups  in  qualifying 
for  some  occupations,  and  increase  the  propor¬ 
tion  of  recruit  populations  qualified  for  occupa¬ 
tions  on  a  cognitive  basis. 
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